Model Selection

Reinforcement learning optimization

# Reinforcement learning optimization

Polaris 4B Preview F32 GGUF

Polaris is an open-source post-training method that uses reinforcement learning to optimize and enhance the model and improve inference capabilities.

Large Language Model

Transformers English

Longwriter Zero 32B I1 GGUF

The LongWriter-Zero-32B quantized model is based on the THU-KEG/LongWriter-Zero-32B base model, supports both Chinese and English, and is suitable for long context scenarios such as reinforcement learning and writing.

Large Language Model

Transformers Supports Multiple Languages

Longwriter Zero 32B GGUF

The LongWriter-Zero-32B Quantized Model is a multilingual model that undergoes static quantization based on the original model. It is suitable for long context scenarios such as reinforcement learning and writing.

Large Language Model

Transformers Supports Multiple Languages

Acereason Nemotron 1.1 7B GGUF

A high-performance 7B parameter language model launched by NVIDIA, focusing on mathematical and code reasoning tasks and supporting a 128k context length.

Large Language Model Supports Multiple Languages

lmstudio-community

Kimi-Dev-72B is an open-source large coding language model for software engineering tasks, achieving the best results among open-source models on SWE-bench Verified.

Large Language Model

Transformers Other

ContentV is an efficient video generation model framework that achieves high-quality video generation with limited computing resources through a minimalist architecture, multi-stage training strategy, and cost-effective reinforcement learning framework.

Video Processing

Qwenlong L1 32B

QwenLong-L1 is a long-context reasoning model trained with reinforcement learning, demonstrating excellent performance across seven long-context document QA benchmarks.

Large Language Model

Verireason Codellama 7b RTLCoder Verilog GRPO Reasoning Tb

VeriReason is a Verilog RTL code generation method that combines reinforcement learning with testbench feedback, significantly improving the performance of pre-trained models in the field of hardware design.

Large Language Model

INTELLECT 2 GGUF

INTELLECT 2 is a large language model launched by PrimeIntellect, supporting a context length of 40960 tokens, trained using the QwQ architecture and GRPO reinforcement learning framework.

Large Language Model

lmstudio-community

Deephermes Financial Fundamentals Prediction Specialist Atropos

This is an experimental financial analysis model optimized for financial fundamentals prediction through the Atropos reinforcement learning framework

Large Language Model

Transformers English

Fine-tuned based on the Qwen/Qwen2.5-1.5B-Instruct model, using the TinyV reward system, which can provide more accurate reward signals in the post-training of efficient reinforcement learning (RL) and significantly improve RL efficiency and the performance of the final model.

Large Language Model

The Camel Model is a text generation model based on the transformer architecture, supporting Azerbaijani and trained using reinforcement learning.

Large Language Model

Transformers Other

Community Request 01 12B

A pre-trained language model merged from multiple Captain-Eris series models using the mergekit tool

Large Language Model

STILL 3 1.5B Preview

STILL-3-1.5B-preview is a slow-thinking model enhanced with reinforcement learning technology, achieving 39.33% accuracy on the AIME benchmark

Large Language Model

Codet5 Large Ntp Py

CodeT5 is a large-scale encoder-decoder model pre-trained with NTP objectives for Python language, focusing on code understanding and generation tasks

Large Language Model

Ppo BreakoutNoFrameskip V4

This is a reinforcement learning agent based on the PPO algorithm, specifically designed for training and evaluation in the BreakoutNoFrameskip-v4 game environment.

Image Generation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase